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DUMMY  VARIABLES  AND  THE  ANALYSIS  OF  COVARIANCE 


Introduction 


Until  the  last  few  years,  most  regression  studies  have  been  con- 
structed aroimd  either  time  series  data  or  cross  section  data.  As  the 
art  of  model  building  developed,  considerable  interest  arose  in  the  test- 
ing of  hypotheses  which  required  the  pooling  together  of  data  from  several 
cross  sections,  e.g.,  the  data  on  several  firms  in  each  of  many  years.  A 
well  known  problem  in  time  series  analysis  had  been  that  of  autocorrelated 
errors.   It  was  known  that  this  phenomenon  had  its  counterpart  in  inter- 
temporal or  pooled  cross  section  analysis.  Regressions  run  on  pooled 
cross  section  data  assume  that  error  terms  are  independent  drawings  while 
in  fact  the  elements  in  each  cross  section  are  often  sampled  in  each  of 
the  cross  section  years.   If  the  errors  are  not  independent,  that  is,  if 
there  are  consistent  components  to  the  error  teiui  of  each  element  every 
time  it  is  sampled,  it  has  been  shown  that  the  unexplained  variance  and 

the  estimates  of  the  slope  coefficients  which  arise  from  the  usual  re- 

[21 

gression  techniques  are  in  error.   Carter  showed  that  the  inclusion  in 

the  pooled  regression  equation  of  a  discrete  (mutually  exclusive  and  ex- 
haustive) dummy  variable  (zero-one)  for  each  cell  would  result  in  unbiased 
estimates  of  unexplained  variance  and  slope  coefficients  and  thus  retain 
the  desired  properties  of  classical  regression  analysis. 

Until  recently,  these  techniques  have  required  such  large  computa- 
tional capacity  that  they  were  of  little  practical  interest.  With  the 
advent  of  more  exotic  computation  facilities,  however,  some  earlier  statis- 
tical devices  which  require  the  pooling  of  data  from  cross  sections  and 
time  series  have  come  to  be  of  real  use. 

During  this  same  time,  the  dummy  variable  technique  suggested  by 
Carter  has  been  shown  to  have  further  interest  in  a  paper  by  Suits.  Suits 
[6]  suggests  that  the  coefficient  of  these  dummy  variables  can  be  of  sig- 
nificant interest  in  and  of  themselves.  That  is,  the  coefficients  asso- 
ciated with  the  dummy  variables  specifically  account  for  those  peculiarities 
associated  with  each  cell  that  are  not  "exjdained"  by  the  included  inde- 
pendent variables. 


While  the  inclusion  of  dummy  variables  may  be  a  device  to  improve 
the  quality  of  estimate  in  a  pooled  cross  section  analysis,  it  should  be 
noted  that  the  validity  of  conclusions  drawn  from  such  analyses  depends 
upon  a  typically  untested  assumption.  Dummy  variable  regressions  assimie 
that  the  intra  cell  slopes  are  identical  and  tests  the  hypothesis  that  the 
cells  have  homogeneous  intercepts.   If  that  hypothesis  is  rejected,  the 
coefficients  of  the  dummy  variables  are  taken  as  estimates  of  the  inter- 
cell  difference.   In  the  language  of  the  analysis  of  covarianee,  dummy 
variables  assume  that  there  is  no  difference  in  within  cell  relations 
(slopes)  and  that  the  only  source  of  variation  is  due  to  differences  be- 
tween the  cells  —  differences  reflected  in  their  intercepts.  Put  in  this 
way,  it  becomes  clear  that  d'oramy  variable  regressions  are  a  kind  of  analy- 
sis of  covariance  --a  partitioiiing  of  explained  variance  into  a  within 
cell  and  a  between  cell  component. 

It  is  the  purpose  of  this  article  to  suggest  that  the  technique  of 
the  analysis  of  covariance  dominates,  in  a  statistical  sense,  the  use  of 
dummy  variables.  We  shall  first  show  how  dummy  variable  regressions  and 
the  analysis  of  covariance  are  related.  This  comparison  will  show  why  we 
think  the  analysis  of  covariance  is  a  more  useful  technique  than  the  use 
of  dummy  variables  in  a  regression  analysis.   Then  in  an  attempt  to  make 
our  prescription  of  the  analysis  of  covariance  somewhat  easier  to  apply 
we  shall  develop  a  set  of  computational  short  cuts.   In  our  view  these 
result  in  computational  difficulties  of  such  small  extra  cost  that  they 
seem  clearly  to  suggest  that  the  analysis  of  covariance  is  a  technique 
much  more  useful  than  simply  inserting  durany  variables  into  regressions. 

In  what  follows  we  will  present  the  test  for  the  significance  of  the 
inclusion  of  a  set  of  dummy  variables  in  a  regression  equation;  the  tests 
required  in  the  analysis  of  covariance;  and  then  contrast  these  two  pro- 
cedures. Finally,  we  shall  suggest  a  set  of  problems  for  which  the  analysis 
of  covariance  framework,  which  partitions  variance  into  between  and  within 
components,  can  be  used  to  separate  long  and  short  run  relations  between 
variables . 
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The  Test  for  the  Significance  of  a  Set  of  Dummy  Variables 

The  test  for  the  significance  of  a  set  of  dummy  variables  is  exactly  the 
same  as  a  test  for  the  significance  of  the  set  of  any  regular  (continuous)  in- 
eluded  variables.   It  results  from  a  partitioning  of  the  total  variance  of  the  de- 
pendent variable  into  that  due  to  the  regression  of  the  regular  independent  varia- 
bles, that  due  to  the  inclusion  of  the  dummy  variables  and  a  residual. 

Let  y. .  and  x  .  be  the  jth  pair  of  observations  in  the  ith  cell  (group) 

(i=l,2,3, . . .1,  j=l,2,3, . . ., j)  of  the  dependent  variable  and  independent  varia- 

** 
ble,  respectively.    The  classical  regression  model,  without  dummy  variables  is 

(1)  y.  .  =  a  +  p  X.  .  +  S.  .. 

If  f.  is  the  effect  of  the  ith  cell,  then  the  model  is 

(2)  y.  .  =  a  +  f .  +  p  X.  .  +  e^  ., 

ij    o    1   "^w  ij    ij' 

or 

(2)  '       y. .  =  a.  +  p  X. .  +  G. ., 
ij    1   ^w  ij    ij' 

where  a,    is  the  sura  of  a  and  f . . 
i  o      1 

In  fact,  when  dummy  variables  are  used  in  a  regression  equation,  they  are 
^^^^ 
added  as  I-l     additional  independent  variables  as  shown  in  equation  (2) ". 


(^'"        ^Ij  =  Po  *  VlJ  -^I  h       ^k^^lj' 


k=l 

where   5  .,  =  0  if  i  j^  k  and   S.,  =  1  if  i  =  k. 
ik  ik 

The  test  for  the  statistical  significance  of  this  set  of  dimimy  variables  is 
summarized  in  T^ble  1. 


Analysis- of -variance  in  regression  contexts  is  discussed  in  Statistics  and  Econo- 
metrics texts,  such  as  Bryant,  [l],  and  Fraser,  [3]- 
** 

In  order  to  keep  the  notation  clear,  we  shall  include  only  one  x  in  the  equations. 

The  analysis  is  easily  extended  to  include  more  than  one  x  and,  in  fact,  all  the 
tables  refer  to  equations  with  n-1  x'So 

V  ,v  .V, 

As  there  will  be  need  for  several  3's,  this  one  has  a  subscript  w  to  signify  it 
is  the  slope  within  the  individual  cells » 
xxxx 

Although  there  are  I  groups,  one  dummy  variable  has  to  be  omitted  in  a  regression 

equation,  in  order  to  obtain  determinate  estimates  of  parameters,  as  suggested 
by  Suits,  [6] . 
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TABLE  1 


Source 

Computational  Short 
Cut  of  SS 

df 

MS 

Reduction  due  to  x's 

SS(y)  -  SSR(a) 

* 
n-1 

(i) 

Additional  reduction  due 
to  dummy  variables 

[SS(y)  -  SSR(D)]  -  [SS(y)  - 

ssr(a)]  =  ssr(a)  - 
ssr(d) 

I-l 

(2) 

Residual 

ssr(d) 

IJ-n-(l-l) 

(6a) 

Total 

SS(y) 

IJ-1 

The  direct  computation  of  the  sum  of  squares  for  Column  1  of  Table  1 
is  tedious,   especially  in  a  miiltiple  regression,  and  hence  a  computa- 
tional short-cut  io  quite  desirable.   Column  2  suggests  such  a  short-cut. 
In  addition,  this  representation  displays  the  contribution  of  dummy 
variables  more  clearly  and  will  be  of  considerable  interest  later. 

Define : 

SS(y)  =  )    (y.  .  -  y)   J,  i.e.,  the  total  sum  of  squares  of  the 

^.  dependent  variable,  with  IJ-1  degrees  of 

freedom . 


SSR(A)  =2_^  (y 


ij 


-  a  - 


bx.  .)' 


i.e.,  the  sum  of  squared  residuals  from 
a  classical  regression  (through  All  the 
data),  with  IJ-n  degrees  of  freedom. 


** 


n  is  the  number  of  independent  variables  (constant  included). 

An  algebraic  expression  for  these  sums  of  squares  can  be  found  in  most 
statistics  textbooks  as  a  part  of  their  discussion  of  regression  analysis. 
See  Bryant  [11,  pp.  21^4-216  or  F-razer  [3],  pp.  296-30^1. 
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SSR(d)  =  )      (y. .  -  a.  -  b  x. .)  ,  i.e.,  the  sum  of  squared  residuals 

~'.  from  a  regression  with  Dummy  variables 

•^  included,  with  IJ  -  n  -  (l  -  l) 

degrees  of  freedom. 


It  should  be  clear  that  SSR(d)  measures  the  unexplained  or  residual 
portion  of  the  variation.  Also,  the  combined  reduction  due  to  the  x's 
and  Z's  is  SS(y)  -  SSR(d).  Since  the  reduction  due  to  the  inclusion  of 
the  x's  is  SS(y)  -  SSR(a),  the  incremental  reduction  due  to  the  inclusion 
of  the  Z's,  the  dummy  variables,  is  [SS(y)  -  SSR(d)]  -  [SS(y)  -  SSR(a)]. 
That  is,  it  is  the  reduction  due  to  the  combined  effects  of  the  x's  and 
the  Z's  minus  the  effect  of  the  x's.  Simplifying,  this  becomes  SSR(a)  - 
SSR(d).   In  other  words,  the  contribution  of  the  dummy  variables  is  meas- 
ured by  the  difference  between  the  residual  sum  of  squares  after  the  in- 
clusion of  the  x's,  but  before  the  inclusion  of  the  dummy  variables,  and 
the  residual  sum  of  squares  after  the  inclusion  of  both  the  x's  and  the 
dummy  variables. 

Mean  squares  (Column  h)   are  obtained  by  dividing  the  SS  by  its 
corresponding  degrees  of  freedom.   In  the  dummy  variable  framwork,  the 
significance  of  the  different  intercepts  is  tested  by  the  F  ratio  ^a(^i\' 
If  the  computed  F  ratio  is  found  to  be  larger  than  the  critical  F  values, 
then  the  cell  effects  are  thought  to  be  non  zero  and  contribute  signifi- 
cantly to  an  understanding  of  the  variation  of  the  dependent  variable. 


The  Analysis  of  Covariance 

The  analysis  of  covariance  extends  the  regression  equation  with  dummy 
variables  to  allow  for  the  inclusion  of  differing  cell  slopes.   It  adds 
a  third  step  in  the  progression  of  models^  namely. 


(3)     y..  =  a  +  P.x,.  +  e 
ij   1   1  ij   ij 


It  results  in  a  partitioning  of  the  total  sum  of  squares  into  the 
following  components: 
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TABLE  2 


Source 

Computational  Short 
Cut  for  SS 

df 

MS 

Due  to  x's. 

SS(y)  -  SSR(A) 

n-^1 

(1) 

Due  to  dummy  variables  — 
different  cell  intercepts. 

SSR(A)  -  SSR(d) 

I-l 

(2) 

Due  to  different  cell 
slopes. 

SSR(d)  -  SSR(I) 

(l-l)(n-l) 

(5) 

Residual 

ssr(i) 

I(j-n) 

(6b) 

Total 

SS(y) 

IJ-1 

where 


ssr(i)  = 


(y. 


ij 


ij 


a. 


b.x,  .)^ 


i.e.,  the  sum  of  squared 
residuals  from  I  regressions 
through  each  of  the  indi- 
vidual cells,  with  l(j-n) 
degrees  of  freedom  in  total. 


In  this  presentation,  the  residual  sum  of  squares  in  the  dummy  variable 
regression  analysis  has  been  split  into  two  components.  One  is  the  re- 
duction in  explained  variance  due  to  allowing  different  within  cell  slopes 
and  the  other  becomes  the  new  estim.ate  of  residual  or  unexplained  variance. 

Within  the  analysis  of  covariance  framework,  the  hypotheses  to  be 
tested  are:   first,  are  the  within  cell  slopes  equal;  and  second,  are  the 
cell  intercepts  different.  The  test  procedure  is  first  to  determine  if 
there  is  any  evidence  that  the  cell  slopes  differ.  This  test  is  performed 
by  dividing  the  mean  square  contribution  due  to  different  cell  slopes  by 
the  residual  mean  square,  that  is   /^^  y     If  there  is  no  evidence  that  all 
slopes  differ,  i.e.,  no  evidence  that  SSR(d)  is  significantly  different 
from  SSR(i),  one  next  tests  whether  the  introduction  of  many  intercepts 


rather  than  a  single  intercept  significantly  reduces  the  sum  of  squares  of 

MS (2 )     * 
residuals  by  forroing  the  ratio  „)x \« 

The  Relationship  Between  the  Use  of  Dummy 
Variables  and  the  Analysis  of  Covariance 

Having  developed  the  tests  implied  in  both  regression  analysis  with 
dummy  variables  and  the  analysis  of  covariance;,  it  is  possible  to  see  how 
these  methods  relate  to  each  other.   In  both  methods,  the  significance  of 
the  cell  intercepts  is  tested  by  comparing  the  mean  square  reduction  due 
to  many  as  opposed  to  one  intercept  with  an  estimate  of  residual  mean 
square.  The  tests  differ  only  in  what  is  considered  the  residual  mean 
square.  The  dummy  variable  regression  procedure  assumes  cell  slopes  are 
the  same;  that  is,  it  assumes  SSR(d)  is  eqvial  to  SSR(i).  Thus  it  tests 
SSR(a)  -  SSR(d)  against  SSR(d).   The  analysis  of  covariance  procedure 
does  not  assume  SSR(d)  equals  SSR(l).   In  fact,  it  explicitly  tests  this 
hypothesis.   If  the  hypothesis  of  equal  cell  slopes  is  rejected,  the 
analysis  of  covaraince  procedure  does  not  allow  one  to  test  the  homogeneity 
of  cell  intercepts. 

Within  the  analysis  of  covariance  framework,  the  test  for  homogeneous 
intercepts  is  conditional  on  the  cell  slopes  have  been  shown  not  to  differ 
significantly  from  each  other.  Thus  the  analysis  of  covariance  procedure 
does  all  that  the  dummy  variable  regression  technique  can  do  and  in  addi- 
tion it  explicitly  tests  the  untested  assumption  of  equal  slopes  which 
dummy  variable  regressions  require.   In  this  sense  the  analysis  of  covariance 
dominates  the  dummy  variable  regression  techniques. 

In  addition  to  the  partitioning  of  explained  variance  as  shown  in  Table 
2,  there  are  more  informative  uses  to  which  one  can  put  the  analysis  of 
covariance.  The  table  2  partitioning  was  employed  to  ease  the  comparison 
with  regression  analysis  encompassing  dummy  variables.   Table  3  shows  a 

y  y.V 

further  possibility.   '   In  it  the  difference  between  SSR(A)  and  SSR(d) 


Some  statisticians  recommend  "pooling".   If  it  is  concluded  that  SSR(d)  = 
SSR(i),  then  MS(5)  and  MS(6b)  would  be  combined  to  form  an  "error"  MS 
based  on  (l-l)(n-l)  more  degrees  of  freedom,  namely  IJ-n-(l-l)  degrees  of 
freedom. 

The  only  additional  computation  required  is  a  set  of  regressions  nui  through 
the  data  for  each  cell.  This  would  seem  to  come  at  little  cost  with  modern 

computers  and  regression  programs. 
X  x  x 

This  method  of  partitioning  the  variance  can  be  found  in  Mood  [5]^  PP»350-356. 
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Source 



Computational  Short 
Cut  for  SS 

df 

MS 

Due  to  x's 

SS(y)  -  SSR(A) 

n-1 

(1) 

Due  to  cell  mean  relation 

ssr(m) 

I-n 

(3) 

Due  to  the  difference  of 
within  and  between  cell 
slopes 

S3R(A)  -  ssr(d)  -  ssr(m) 

n-1 

(h) 

Due  to  different  cell 
slopes 

ssr(d)  -  ssr(i) 

(l-l)(n-l) 

(5) 

Residuals 

ssr(i) 

I(j-n) 

(6b) 

Total 

SS(y) 

IJ-1 

is  further  partitioned  into  two  parts  --  SSR(M),  and  SSR(A)  minus  SSR(d) 
minus  SSR(m).  The  SSR(m)  term  is  the  sum  of  squares  of  residuals  from  a  regression 
through  the  cell  means.   Under  this  partitioning,  as  before,  one  tests 
for  homogeneous  within  cell  slopes  by  comparing  MS(5)  with  MS(6b).  Given 
that  these  intra-cell  slopes  are  homogeneous,  one  tests  for  the  signifi- 
cance of  different  intra-cell  intercepts  in  two  steps.  The  first  step 
is  a  test  for  the  existence  of  a  between  cell  relation.  This  is  the  pur- 
pose of  the  regression  through  the  cell  mean  data.   Its  existence  is 
measured  by  comparing  MS (3)  with  MS (6b).   If  the  regression  through  the 
cell  means  "fits"  as  well  as  the  individual  within  cell  regressions  the 
between  cell  relation  is  said  to  exist. 

Given  that  this  between  cell  relation  exists,  and  that  the  intra-cell 
slopes  are  homogeneous  --  that  an  intra-cell  slope  exists,  one  compares  the 
two.   If  the  slope  of  the  regression  through  the  cell  means  differs  from 
the  slope  of  the  regression  within  the  cells,  it  is  said  that  there  exist 
different  intra-cell  intercepts. 
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This  test  is  an  extremely  useful  one  to  test  the  hypothesis  that  the 
within  cell  regression  relation  differs  from  the  between  cell  regression 
relation  within  a  body  of  pooled  cross  section  data.  This  last  specifica- 
tion of  the  analysis  of  covariance  is  more  subtle  than  that  in  Table  2,    as 
it  specifically  tests  first  for  the  existence  of  a  homogeneous  within  cell 
relation  and  for  the  existence  of  a  between  cell  regression  relation  and 
only  then  compares  the  two  to  see  if  they  are  equal.  That  is,  it  will  not 
offer  evidence  that  the  between  cell  relation  differs  from  the  within  cell 
relation  unless  it  is  satisfied  that  both  relationships  actually  exist. 

At  this  point,  a  graphical  presentation  of  what  the  analysis  of  co- 
variance  procedure  enables  one  to  test  may  be  useful. 

Suppose  one  ran  a  regression  between  a  dependent  variable  y  and  an 
independent  variable  x  where  y  and  x  are  data  from  several  firms  in  each 
of  several  years.  Let  the  diagram  in  Figure  1  stand  for  the  estimated 
regression  relation  between  y  and  x. 

FIGURE  1 


This  regression  relation  could  arise  from  any  one  of  the  possibilities 
shown  in  Figures  2  tiirough  5>  where  the  oval  represents  the  cluster  of 
data  for  each  firm. 
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FIGURE  2 


FIGURE  3 


y 


FIGURE  h 


FIGURE  5 
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In  Figure  2  the  within  firm  regression  relation  between  y  and  x  is 
the  same  as  the  between  firm  relation.   In  Figure  3  the  within  firm  re- 
gression relation  is  quite  different  from  the  between  firm  regression  re- 
lation. This  is  a  situation  in  which  regression  analysis  using  dummy 
variables  would  be  acceptable.   Figure  k,    on  the  other  hand,  presents  a 
situation  where  there  is  some  evidence  of  a  between  firm  regression  rela- 
tion but  non  homogeneous  relations  of  y  to  x  among  firms.   Figure  5  pre- 
sents the  most  pervase  of  the  possibilities.   It  is  meant  to  suggest  the 
possibility  of  a  generally  upward  sloping  relation  between  y  and  x  without 
a  significant  between  firm  regression  relation  and  heterogeneous  within 
firm  relations.  Given  a  regression  relation  as  in  Figure  1,  it  is  the  pvir- 
pose  of  the  analysis  of  covariance  technique  to  distinguish  whether  the 
underlying  process  is  that  in  Figure  2  or  one  of  those  in  Figures  3 
through  5« 

The  next  section  of  this  paper  presents  an  example  of  how  this 
specification  of  the  analysis  of  covariance  procedure  can  exploit  the 
data  generated  by  an  actual  process  for  considerable  information  about  its 
nature . 

The  Analysis  of  Covariance  and  the  Framing  of  Hypotheses 

The  kinds  of  problems  discussed  in  this  paper  all  concerned  pooled 
cross  section  data.  The  dummy  variable  regression  technique  concentrates 
interest  on  the  within  cell  slope  and  the  within  cell  intercepts.  The 
analysis  of  covariance  framework,  however,  can  be  used  to  concentrate 
attention  on  a  different  aspect  of  the  problem.   It  can  be  used  to  parti- 
tion the  variation  explained  by  the  regression  equation  into  two  parts  — 
that  variation  arising  from  relations  between  the  cell  means  and  that 
variation  which  arises  from  relations  within  the  cells.  For  a  kind  of 
problem  of  significant  interest  to  the  authors  such  a  difference  in  focus 
is  quite  important.   Suppose  the  pooled  cross  sections  are  data  on  stock 
prices  of  a  set  of  firms  in  each  of  several  years  and  variables  associated 
with  the  firms  which  are  thought  to  affect  these  stock  prices. 
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It  seems  sensible  to  hypothesize  that  variations  in  price  may  occur 
because  one  firm  differs  (on  the  average)  from  another,  _or  because  in  a 
specific  year  one  firm  pursues  policies  different  from  its  average  policy. 
Thus  the  total  of  the  variation  being  explained  can  be  thought  of  as 
arising  from  variations  between  the  firms  and  variations  within  the  firms 
over  the  period  studied.   To  be  specific,  it  was  hypothesized  that  stock 
prices  responded  to  debt  and  dividends  payout  policies  and  that  these 
responses  were  composed  of  two  types  of  influences: 

a.  The  influence  of  debt  and  dividend  policies,  which  were  said  to 
be  described  by  the  average  of  the  variables,  and 

b.  the  influence  of  short  run  variation  in  debt  and  dividends  around 
these  desired  or  policy  levels. 

Thus  in  any  specific  year,  the  stock  price  of,  say,  Standard  Oil  of  New 
Jersey,  is  thought  to  differ  from  that  of  Texaco  not  only  because  Standard 
pursues  different  financial  policies  but  also  because,  in  that  year.  Stan- 
dard and/or  Texaco  may  have  debt  ratios  or  dividend  payout  ratios  which 
differ  from  their  target  or  average  ratios  due  to  the  peculiarities  of 
that  year.  Stated  in  even  another  way,  variations  in  stock  prices  are 
thought  to  arise  from  variations  in  established  financial  policies  between 
companies,  and  from  within  company  year-to-year  aberrations  around  these 
financial  policies. 

For  such  a  problem,  the  analysis  of  covariance  as  presented  in  Table  3 
seems  a  most  appropriate  statistical  tool.  First,  within  its  framework 
it  is  possible  to  test  for  the  existence  of  differences  in  stock  prices 
which  arise  from  between  company  differences.  These  are  the  differences 
thought  to  arise  from  the  fact  that  the  different  companies  pursue  different 
financial  policies  which  will  be  measured  by  the  average  of  the  variable 
in  question  over  the  time  period  studied.   Second,  it  is  possible  to  test 
for  the  existence  of  a  response  of  stock  prices  to  short  run  fluctuations 
in  financial  variables  ground  their  average  or  target  level.   Finally,  it 
is  possible  to  test  to  see  if  the  two  relations  --  the  long  run  and  the 
short  run  —  of  debt  and  dividends  on  stock  prices  are  equal.   If,  in  fact, 
the  evidence  is  such  that  one  has  reason  to  believe  the  data  are  generated 
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by  two  sets  of  forces  --  short  run  and  long  run  --  there  is  considerable 
reason  to  question  the  validity  of  some  other  studies  of  the  relation 
between  stock  prices  and  financial  variables  which  made  no  attempt  to 
isolate  these  two  different  types  of  influences. 

The  financial  variables  which  affect  stock  prices  are  said  to  be  the 
ratio  of  debt  to  total  capitalization  and  the  ratio  of  dividends  to  net 
profits.  The  process  by  which  these  two  variables  affect  stock  prices 
is  said  to  be  the  following.   In  determining  the  price  they  are  willing 
to  pay  for  a  particular  security,  investors  formulate  a  return  they  require 
from  the  stock  in  order  to  justify  their  holding  this  stock  rather  than 
some  other.  The  return  they  expect  is  composed  of  dividends  and  price 
appreciation.  The  rate  of  return  they  realize  is  the  sum  of  the  dividend 
price  ratio  —  the  dividend  yield  —  plus  the  percentage  growth  in  price. 
Thus,  given  their  required  rate  of  return,  stockholders  estimate  a  growth 
rate  and  the  prospective  dividend  per  share  and  pay  a  price  such  that  the 
prospective  dividend  yield  and  capital  gain  is  at  least  as  high  as  their 
required  rate  of  return.  That  is,  given  their  expectation  about  the  growth 
rate,  g,  and  the  dividend  per  share,  DIV,  stockholders  set  prices  so  that 

DIV   . 

+  ff  =  r 


PRICE 


where  r  is  the  return  they  require  in  order  to  hold  this  particular 
security. 

The  required  rate  of  return  is  taken  to  be  higher  if  the  company 
employs  more  debt  to  finance  itself.  The  more  of  the  money  that  finances 
the  company  which  requires  fixed  contractual  payments  --  the  higher  the 
debt  ratio,  the  higher  the  risk  and  thus  the  higher  the  required  return 
is  said  to  be.  The  role  of  the  dividend  payout  ratio  is  more  difficult  to 
state  a  priori.  Since  the  capital  gains  tax  rate  is  lower  than  the  income 
tax  rate  for  most  stockholders,  low  payout  ratios  would  seem  to  cause  stock- 
holders to  require  lower  before  tax  returns  as  this  would  allow  most  of  the 
return  to  come  in  the  form  of  capital  gains  which  would  arise  from  the  re- 
tention of  earnings.  On  the  other  hand,  the  feeling  that  dividends  now 
might  be  preferred  to  possible  capital  gains  later  might  make  higher  payout 
ratios  induce  stockholders  to  accept  lower  returns  because  they  thought  of 
them  as  more  secure  returns.  Thus,  we  have  no  strong  a  priori  views  on  the 
sign  of  the  slope  coefficient  of  the  dividend  payout  r8,tio  term. 


-  li^  - 

The  model  as  developed  can  be  presented  as 


^^  *  «)ij  '   (f  ^  -h    *  %  *  \^^^^U  *  ^2<i5)«  ^  «ij 


where 


(-=Ii^  +  g).  .       is  the  dividend  yield  plus  growth  rate  —  the  total 


P      iJ 


rate  of  return  for  the  ith  firm  in  the  jth  year. 


(— ^  +  g) .        is  the  average  rate  of  return  for  all  I  firms  in 
^      "^        year  J.   It  is  a  "market"  rate  of  return  for  that 
yearc 

( )  .  is  the  ratio  of  debt  to  total  capitalization  or  of 

^     •^'J  debt  to  debt  plus  equity  of  the  ith  firm  in  the  jth 

year. 

(^— ).  .  is  the  ratio  of  dividends  to  profits  for  the  ith 

^^  ^-^  finn  in  the  jth  year. 


That  is,  the  required  return  for  the  ith  firm  in  year  j  depends  first 
on  the  state  of  the  stock  market  in  year  j  as  measured  by  the  average  re- 
quired return  for  all  stocks  covered.   In  addition  to  this  effect  of  the 
year,  however,  there  are  two  effects  which  are  peculiar  to  each  company  — 
that  arising  from  its  debt  policy  and  that  arising  from  its  dividend 
policy. 

■  To  allow  the  pooling  of  the  annual  cross  sections,  the  model  will  be 
written  for  testing  purposes  as 


^—  "■   S)ij  -  (—  +  S)j   =  %  •*•  h^-mhj   -^  ^2(pR0)ij  +  «ij 

The  sample  used  to  test  this  relation  is  dra-wn  from  20  firms  in  the 
Food  industry  with  the  data  on  each  firm  drawn  from  the  years  19^9  to  I96O. 
A  regression  through  all  the  data  yielded  the  following  results: 
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(4.1796)*  (.3554)  (^+.3735) 


R^  =    0O768 

V  y 

F(2,  237)  =  9.86i+5 


In  this  regression,  the  correlation,  while  significant,  is  quite  low. 
There  seems  no  relation  between  debt  ratios  and  required  returns.  Finally, 
the  significant  and  negative  coefficient  of  dividends  suggests  that  prices 
rise  as  payout  ratios  rise. 

A  regression  which  employed  dummy  variables,  however,  yielded  somewhat 
different  results.  These  were 


(DlV  +  g)..  .  (DlV  +  g).  .   .2196    -  .1275(t^)..  -  -3356(11^)..  +f.  +e.. 

(7.O6U)       (3.0913)         (7.1061^) 


R^  =  .1957 

F(21,  218)  =  2.326k 

In  this  equation,  the  correlation,  while  small,  is  significant.  Moreover, 
there  seem  to  be  significant  relations  within  each  firm  (each  cell)  between 
debt,  dividends,  and  stock  prices  --  the  t  ratios  are  quite  significant. 
Surprisingly,  higher  debt  ratios  mean  higher  stock  prices  and,  as  before, 
higher  dividend  payout  ratios  mean  higher  stock  prices . 


* 
The  numbers  in  parentheses  are  the  "t"  ratios  associated  with  each  slope 

coefficient. 

This  is  signficantly  different  from  zero  at  the  .01  level. 

\i  \f  \f 
A  A  A 

This  result  is  like  that  found  by  others  who  have  studied  stock  prices. 
See  for  instance  Myron  Gordon,  [k], 

A/  \/  V  V 

T\  "R'  'A  "A 

Significantly  different  from  zero  at  the  .01  level. 
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If  we  now  apply  the  analysis  of  covariance  technique  to  this  problem 
and  specifically  test  first  for  the  existence  of  a  relation  through  the 
cell  means  and  second  for  the  existence  of  homogeneous  within  cell  (within 
firm)  relations  between  debt  and  dividends  and  stock  prices,  these  first 
results  are  shown  to  be  quite  misleading.  First,  as  can  be  seen  in  Table 
h,    there  is  evidence  of  the  existence  of  a  regression  relation  through 
the  cell  means.  This  follows  from  the  fact  that  the  cell  mean  regression 
fits  as  well  as  the  individual  cell  regression  --  from  the  fact  that  MS (3) 
does  not  significantly  differ  from  MS (6b).   However,  from  the  estimated 
regression  equation  through  the  cell  mean  data, 


Uf^'..)y-(f^'^.)jl,=    .0179    *     .01150,-    •^nO,^^, 

(.78U8)  (.34^13)  (.9596) 

we  see  that  the  slope  coefficients  for  the  debt  and  dividend  terms  from 
this  cell  mean  regression  are  not  like  those  obtained  from  the  dimimy 
variable  regression.  These  slope  coefficients  are  different  in  sign,  in 
magnitude,  and  in  their  statistical  significance.  They  suggest  that  while 
debt  and  dividend  explain  part  of  the  between  firm  variation  in  prices, 
they  do  not  explain  very  much. 

Second,  the  within  cell  relations  or  what  we  call  the  short  run  rela- 
tion of  prices  to  debt  and  dividend  fluctuations  aroiind  their  means  are  not 
homogeneous  among  firms.   That  this  is  so  can  be  seen  from  the  comparison 
of  MS(5)  and  MS(6b).  For  some  firms,  stock  prices  rise  as  dividend  payout 
rise  and  for  other  firms  stock  prices  fall. 

Thus,  while  both  the  regression  through  all  the  data  and  the  dummy 
variable  regressions  gave  evidence  of  a  relation  between  dividends  and 
debt  ratios  and  stock  prices,  with  dividends  seeming  to  play  a  quite  sig- 
nificant role,  a  closer  examination  shows  little  evidence  of  such  a  rela- 
tion between  firms.   Furthermore,  what  within  firm  relation  that  does  exist 
is  not  homogeneous  between  fiims.  An  application  of  the  analysis  of  co- 
variance  procedure  to  this  model  of  stock  prices  suggests  great  care  must 
be  used  in  the  interpretation  of  simple  cross  section  studies  and  any  pooled 
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TABLE  k 


Source 

Computation  Short 
Cut  for  SS 

df 

MS 

Computed 
F 

Critical 

F 

.01 

Due  to  x's 

S3(y)  -  SSR(A)  - 
.11689^^9 

3-1=2 

(1) 

.O58H75 

12.7509 

4.78 

Di'p  to  cell  mean 
relation 

SSR(m)  =  .065'+840 

20-3=17 

(3) 

.0038520 

.8404 

2.10 

Due  to  the  differ- 
ence of  within  and 
between  cell  slope 

SSR(A)  -  SSR(d)  - 

ssr(m)  =-■ 

.115^^781 

3-1=2 

(4) 
.0577391 

12.5963 

4.78 

■^'^  to  different 
cell  slopes 

SSR(d)  -  SSR(I)  = 
.3991303 

(20-1) (3-1) 

=  38 

(5) 

.010503i^ 

2.2914 

1.72 

Residuals 

SSR(I)  =  .8250828 

20(12-3) 
=  180 

(6b) 

.0045838 

Total 

SS(y)  =  1.5220701 

20  X  12-1 
=  239 

where  SSR(M) 

ssr(a) 
ssr(d) 

SSR(I) 


.0654840 
1.4051752 
lo 2242131 
.8250828 
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cross  section  studies  such  as  those  employing  dummy  variables  which  do 
not  explicitly  test  for  homogeneous  within  cell  behavior  and  the  existence 
of  a  well  defined  between  cell  behavior.   The  results  from  the  regressions 
which  allowed  each  firm  its  own  set  of  slope  coefficients  showed  the  re- 
sults from  the  dummy  variable  regression  to  be  most  misleading. 

Conclusion 


The  purpose  of  this  article  has  been  threefold.   It  has  attempted  to 
draw  the  link  between  regression  analyses  with  dummy  variables  and  the 
analysis  of  covariance.   In  addition  it  has  advanced  the  vjew  that  the 
analysis  of  covariance  dominates  dummy  variable  regressions  in  the  examina- 
tion of  pooled  cross  section  data  --it  generates  more  informative  tests 
at  what  seem  to  be  low  cost.   Finally,  we  have  presented  an  application 
of  the  analysis  of  covariance  to  a  kind  of  problem  where  its  use  results 
in  the  ability  to  separate  long  run  and  short  run  relationships  between 
variables.   This  was  an  example  which  attempted  to  isolate  the  long  run 
and  the  short  run  relation  of  financial  policies  to  stock  prices.   This 
example  was  meant  to  serve  as  evidence  that  indiscriminant  use  of  regres- 
sion analysis  using  dummy  variables  can  be  most  misleading,  and  careful 
use  of  the  analysis  of  covariance  can  be  much  more  informative. 
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